Fijamos la semilla para poder reproducir los experimentos. También se fija el numero de CPU’s a utilizar.
set.seed(42)
options(mc.cores = 24)
Se importan las librerÃas a utilizar a lo largo de la notebook:
# install.packages(pacman)
# install.packages("https://cran.r-project.org/src/contrib/rstan_2.21.2.tar.gz",repos = NULL,type="source")
# sudo apt-get install libglpk-dev
library(pacman)
p_load(tidyverse, tidymodels, rsample, rstan, shinystan, rstanarm, devtools)
p_load_gh('adrianmarino/commons')
import('../src/dataset.R')
[1] "-> '../src/dataset.R' script loadded successfuly!"
import('../src/plot.R')
[1] "-> '../src/plot.R' script loadded successfuly!"
import('../src/model.R')
[1] "-> './bayesian_regression_predictor.R' script loadded successfuly!"
[1] "-> '../src/model.R' script loadded successfuly!"
Palmer Penguins
dataset <- load_dataset() %>% mutate_if(is.character, as.factor)
Rows: 344 Columns: 8
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dataset %>% glimpse()
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torg…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, 38.6, 34.6, 36.6…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1, 17.3, 17.6, 21.2, 21.1, 17.8…
$ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180, 182, 191, 198, 185, 195, 197, 184…
$ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3700, 3200, 3800, 4400, 3700…
$ sex <fct> male, female, female, NA, female, male, female, male, NA, NA, NA, NA, female, male, male, fe…
$ year <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…
Se enumeran y describen breve-mente cada variable que forma parte del dataset:
Variables Numéricas:
Variables Categóricas:
A continuación veamos los posibles valores de las variables categóricas:
show_values(dataset %>% select_if(negate(is.numeric)))
_____________
species n
=============
Adelie 152
Chinstrap 68
Gentoo 124
¯¯¯¯¯¯¯¯¯¯¯¯¯
_____________
island n
=============
Biscoe 168
Dream 124
Torgersen 52
¯¯¯¯¯¯¯¯¯¯¯¯¯
__________
sex n
==========
female 165
male 168
NA 11
¯¯¯¯¯¯¯¯¯¯
missings_summary(dataset)
hist_plots(dataset)
2021-12-05T16:11:28.009159Z [rsession-adrian] ERROR R.getOption: rstudio.errors.suppressed made from non-main thread; LOGGED FROM: SEXPREC* rstudio::r::options::getOption(const string&) src/cpp/r/ROptions.cpp:83
2021-12-05T16:11:28.009217Z [rsession-adrian] ERROR evaluateExpression called from thread other than main; LOGGED FROM: rstudio::core::Error rstudio::r::exec::{anonymous}::evaluateExpressionsUnsafe(SEXP, SEXP, SEXPREC**, rstudio::r::sexp::Protect*, rstudio::r::exec::{anonymous}::EvalType) src/cpp/r/RExec.cpp:140
2021-12-05T16:11:28.009231Z [rsession-adrian] ERROR r error 5 (R symbol not found) [symbol: .rs.rstudioapi.processRequest]; OCCURRED AT rstudio::core::Error rstudio::r::exec::RFunction::call(SEXP, bool, SEXPREC**, rstudio::r::sexp::Protect*) src/cpp/r/RExec.cpp:475; LOGGED FROM: virtual bool rstudio::session::async_r::AsyncRProcess::onContinue() src/cpp/session/SessionAsyncRProcess.cpp:255
2021-12-05T16:11:28.209519Z [rsession-adrian] ERROR R.getOption: rstudio.errors.suppressed made from non-main thread; LOGGED FROM: SEXPREC* rstudio::r::options::getOption(const string&) src/cpp/r/ROptions.cpp:83
2021-12-05T16:11:28.209562Z [rsession-adrian] ERROR evaluateExpression called from thread other than main; LOGGED FROM: rstudio::core::Error rstudio::r::exec::{anonymous}::evaluateExpressionsUnsafe(SEXP, SEXP, SEXPREC**, rstudio::r::sexp::Protect*, rstudio::r::exec::{anonymous}::EvalType) src/cpp/r/RExec.cpp:140
2021-12-05T16:11:28.209575Z [rsession-adrian] ERROR r error 5 (R symbol not found) [symbol: .rs.getRmdRuntime]; OCCURRED AT rstudio::core::Error rstudio::r::exec::RFunction::call(SEXP, bool, SEXPREC**, rstudio::r::sexp::Protect*) src/cpp/r/RExec.cpp:475; LOGGED FROM: std::__cxx11::string rstudio::session::modules::rmarkdown::{anonymous}::RenderRmd::getRuntime(const rstudio::core::FilePath&) src/cpp/session/modules/rmarkdown/SessionRMarkdown.cpp:443
Observaciones
box_plots(dataset)
Warning: Removed 1023 rows containing non-finite values (stat_boxplot).
Observaciones
** COMPLETAR
No se registran valores mas extremos que el mÃnimo y máximo valor en cada variables. Es decir que no encontramos outliers.
outliers(dataset, column='bill_length_mm')
$inf
[1] 32.1
$sup
[1] 59.6
outliers(dataset, column='bill_length_mm')
$inf
[1] 32.1
$sup
[1] 59.6
outliers(dataset, column='bill_depth_mm')
$inf
[1] 13.1
$sup
[1] 21.5
outliers(dataset, column='flipper_length_mm')
$inf
[1] 172
$sup
[1] 231
outliers(dataset, column='body_mass_g')
$inf
[1] 2700
$sup
[1] 6300
outliers(dataset, column='year')
$inf
[1] 2007
$sup
[1] 2009
bar_plots(dataset)
Observaciones
dataset <- dataset %>% drop_na()
missings_summary(dataset)
Warning: attributes are not identical across measure variables;
they will be dropped
corr_plot(dataset %>% dplyr::select(-year))
segmented_pairs_plot(dataset, segment_column='species')
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
train_test <- train_test_split(dataset, train_size = 0.7, shuffle = TRUE)
[1] "Train set size: 233"
[1] "Test set size: 100"
train_set <- train_test[[1]]
test_set <- train_test[[2]]
lineal_model_1 <- lm(
body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm,
data = train_set
)
bayesion_model_1 <- stan(
model_code = "
data {
int<lower=1> obs_count;
vector<lower=1>[obs_count] x1;
vector<lower=1>[obs_count] x2;
vector<lower=1>[obs_count] x3;
vector[obs_count] y;
}
parameters {
real beta0;
real beta1;
real beta2;
real beta3;
real<lower=0> sigma;
}
model {
beta0 ~ normal(0, 8000);
beta1 ~ normal(0, 100);
beta2 ~ normal(0, 100);
beta3 ~ normal(0, 100);
sigma ~ exponential(0.1);
y ~ normal(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3, sigma);
}
",
data = list(
obs_count = nrow(train_set),
y = colvalues(train_set, 'body_mass_g'),
x1 = colvalues(train_set, 'bill_length_mm'),
x2 = colvalues(train_set, 'bill_depth_mm'),
x3 = colvalues(train_set, 'flipper_length_mm')
),
chains = 3,
iter = 300,
warmup = 180,
thin = 1
)
starting worker pid=19864 on localhost:11335 at 22:08:03.624
starting worker pid=19901 on localhost:11335 at 22:08:03.739
starting worker pid=19938 on localhost:11335 at 22:08:03.860
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 3.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 1: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 1: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 1: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 1: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 1: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 1: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 1: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 1: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 1: Iteration: 270 / 300 [ 90%] (Sampling)
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.5 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 2: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 300 / 300 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.899901 seconds (Warm-up)
Chain 1: 0.309153 seconds (Sampling)
Chain 1: 1.20905 seconds (Total)
Chain 1:
Chain 2: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 2: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 2: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 2: Iteration: 150 / 300 [ 50%] (Warmup)
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 3.5e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 2: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 2: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 3: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 2: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 2: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 3: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 2: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 2: Iteration: 300 / 300 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1.06276 seconds (Warm-up)
Chain 2: 0.620073 seconds (Sampling)
Chain 2: 1.68283 seconds (Total)
Chain 2:
Chain 3: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 3: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 3: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 3: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 3: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 3: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 3: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 3: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 300 / 300 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 1.37176 seconds (Warm-up)
Chain 3: 0.586279 seconds (Sampling)
Chain 3: 1.95804 seconds (Total)
Chain 3:
params_1 <- c('beta0', 'beta1', 'beta2', 'beta3', 'sigma')
traceplot(bayesion_model_1, pars = params_1, inc_warmup = TRUE)
lm_vs_br_coeficients(lineal_model_1, bayesion_model_1, params_1)
vars_1 <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm')
models_validation(lineal_model_1, bayesion_model_1, params_1, vars_1, test_set)
bayesion_predictor_1 <- BayesianRegressionPredictor.from(bayesion_model_1, params_1, vars_1)
plot_compare_fit(
lineal_model_1,
bayesion_predictor_1,
train_set,
label_1='Regresion Lineal',
label_2='Regresion Bayesiana'
)
lineal_model_2 <- lm(
body_mass_g
~ bill_length_mm
+ bill_depth_mm
+ flipper_length_mm
+ sex,
data = train_set
)
Antes que anda transformamos la columna categórica a un one-hot encoding:
cat_cols <- c('sex')
train_set_cat <- train_set %>% dummify(cat_cols)
test_set_cat <- test_set %>% dummify(cat_cols)
train_set_cat
Construimos una matriz con todas las variables x + intercept.
to_model_input <- function(df) {
model.matrix(
body_mass_g
~ bill_length_mm
+ bill_depth_mm
+ flipper_length_mm
+ sex_female
+ sex_male,
data = df
)
}
train_X <- train_set_cat %>% to_model_input()
test_X <- test_set_cat %>% to_model_input()
bayesion_model_2 <- stan(
model_code = "
data {
int<lower=1> obs_count;
int<lower=1> coef_count;
matrix[obs_count,coef_count] X;
vector[obs_count] y;
}
parameters {
vector[coef_count] beta;
real<lower=0> sigma;
}
model {
beta[1] ~ normal(0, 2000);
beta[2] ~ normal(0, 30);
beta[3] ~ normal(0, 100);
beta[4] ~ normal(0, 100);
beta[5] ~ normal(0, 100);
beta[6] ~ normal(0, 100);
sigma ~ exponential(0.1);
y ~ normal(X * beta, sigma);
}
",
data = list(
obs_count = dim(train_X)[1],
coef_count = dim(train_X)[2],
y = colvalues(train_set_cat, 'body_mass_g'),
X = train_X
),
chains = 3,
iter = 300,
warmup = 180,
thin = 1
)
starting worker pid=20140 on localhost:11335 at 22:08:38.242
starting worker pid=20177 on localhost:11335 at 22:08:38.355
starting worker pid=20214 on localhost:11335 at 22:08:38.465
SAMPLING FOR MODEL '23ab4c58649acf62fe6953fd9b605e50' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 1: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 1: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 1: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 1: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 1: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 1: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 1: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 1: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 1: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 1: Iteration: 300 / 300 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.409627 seconds (Warm-up)
Chain 1: 0.152377 seconds (Sampling)
Chain 1: 0.562004 seconds (Total)
Chain 1:
SAMPLING FOR MODEL '23ab4c58649acf62fe6953fd9b605e50' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.8e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 2: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 2: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 2: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 2: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 2: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 2: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 2: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 2: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 2: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 2: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 2: Iteration: 300 / 300 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.481458 seconds (Warm-up)
Chain 2: 0.145764 seconds (Sampling)
Chain 2: 0.627222 seconds (Total)
Chain 2:
SAMPLING FOR MODEL '23ab4c58649acf62fe6953fd9b605e50' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 2.8e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 3: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 3: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 3: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 3: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 3: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 3: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 3: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 3: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 3: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 3: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 300 / 300 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.443068 seconds (Warm-up)
Chain 3: 0.130323 seconds (Sampling)
Chain 3: 0.573391 seconds (Total)
Chain 3:
params_2 <- c('beta[1]', 'beta[2]', 'beta[3]', 'beta[4]', 'beta[5]', 'beta[6]', 'sigma')
traceplot(bayesion_model_2, inc_warmup = TRUE, pars = params_2)
lineal_model_2$coefficients
(Intercept) bill_length_mm bill_depth_mm flipper_length_mm sexmale
-1922.6152066 -0.5330962 -92.9216684 37.2016333 534.1583252
br_coeficients(bayesion_model_2, params_2)
vars_2 <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'sex_female','sex_male')
models_validation(
lineal_model_2,
bayesion_model_2,
params_2,
vars_2,
test_set,
test_set_cat
)
bayesion_predictor_2 <- BayesianRegressionPredictor.from(bayesion_model_2, params_2, vars_2)
plot_compare_fit(
lineal_model_2,
bayesion_predictor_2,
test_set,
test_set_cat,
label_1='Regresion Lineal',
label_2='Regresion Bayesiana'
)
plot_data(train_set)
train_set_with_outliers <- train_set %>%
mutate(flipper_length_mm = ifelse(
body_mass_g > 5600 ,
flipper_length_mm + (flipper_length_mm * runif(1, 0.2, 0.3)),
flipper_length_mm
))
plot_data(train_set_with_outliers)
lineal_model_3 <- lm(
body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm,
data = train_set_with_outliers
)
plot_compare_fit(
lineal_model_1,
lineal_model_3,
train_set,
label_1='Regresión Lineal SIN outliers',
label_2='Regresión Lineal CON outliters'
)
lm_vs_lm_coeficients(lineal_model_1, lineal_model_3)
bayesion_model_3 <- stan(
model_code = "
data {
int<lower=1> obs_count;
vector<lower=1>[obs_count] x1;
vector<lower=1>[obs_count] x2;
vector<lower=1>[obs_count] x3;
vector[obs_count] y;
}
parameters {
real beta0;
real beta1;
real beta2;
real beta3;
real<lower=0> sigma;
}
model {
beta0 ~ normal(0, 8000);
beta1 ~ normal(0, 100);
beta2 ~ normal(0, 100);
beta3 ~ normal(0, 100);
sigma ~ exponential(0.1);
y ~ normal(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3, sigma);
}
",
data = list(
obs_count = nrow(train_set_with_outliers),
y = colvalues(train_set_with_outliers, 'body_mass_g'),
x1 = colvalues(train_set_with_outliers, 'bill_length_mm'),
x2 = colvalues(train_set_with_outliers, 'bill_depth_mm'),
x3 = colvalues(train_set_with_outliers, 'flipper_length_mm')
),
chains = 3,
iter = 300,
warmup = 180,
thin = 1
)
starting worker pid=20311 on localhost:11335 at 22:08:50.674
starting worker pid=20348 on localhost:11335 at 22:08:50.788
starting worker pid=20385 on localhost:11335 at 22:08:50.898
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 3.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 1: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 1: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 1: Iteration: 120 / 300 [ 40%] (Warmup)
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3.5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 2: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 1: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 1: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 1: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 1: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 2: Iteration: 60 / 300 [ 20%] (Warmup)
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 4.8e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 1: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 300 / 300 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.903207 seconds (Warm-up)
Chain 1: 0.310078 seconds (Sampling)
Chain 1: 1.21328 seconds (Total)
Chain 1:
Chain 2: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 3: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 2: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 2: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 2: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 2: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 2: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 3: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 2: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 2: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 2: Iteration: 300 / 300 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1.08664 seconds (Warm-up)
Chain 2: 0.271297 seconds (Sampling)
Chain 2: 1.35793 seconds (Total)
Chain 2:
Chain 3: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 3: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 3: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 3: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 3: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 3: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 300 / 300 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 1.17765 seconds (Warm-up)
Chain 3: 0.337903 seconds (Sampling)
Chain 3: 1.51555 seconds (Total)
Chain 3:
params_3 <- c('beta0', 'beta1', 'beta2', 'beta3', 'sigma')
traceplot(bayesion_model_3, pars = params_3, inc_warmup = TRUE)
lm_vs_br_coeficients(lineal_model_3, bayesion_model_3, params_3)
vars_3 <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm')
models_validation(lineal_model_3, bayesion_model_3, params_3, vars_3, test_set)
bayesion_predictor_3 <- BayesianRegressionPredictor.from(bayesion_model_3, params_3, vars_3)
plot_compare_fit(
lineal_model_1,
bayesion_predictor_3,
train_set,
label_1='Regresion Lineal SIN outliers',
label_2='Regresion Bayesiana CON outliers'
)
En este aso entrenamos solo con el 10% de lo datos.
train_test <- train_test_split(dataset, train_size = 0.05, shuffle = TRUE)
[1] "Train set size: 16"
[1] "Test set size: 317"
train_set_4 <- train_test[[1]]
test_set_4 <- train_test[[2]]
plot_data(train_set_4)
lineal_model_4 <- lm(
body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm,
data = train_set_4
)
bayesion_model_4 <- stan(
model_code = "
data {
int<lower=1> obs_count;
vector<lower=1>[obs_count] x1;
vector<lower=1>[obs_count] x2;
vector<lower=1>[obs_count] x3;
vector[obs_count] y;
}
parameters {
real beta0;
real beta1;
real beta2;
real beta3;
real<lower=0> sigma;
}
model {
beta0 ~ normal(0, 8000);
beta1 ~ normal(0, 100);
beta2 ~ normal(0, 100);
beta3 ~ normal(0, 100);
sigma ~ exponential(0.1);
y ~ normal(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3, sigma);
}
",
data = list(
obs_count = nrow(train_set_4),
y = colvalues(train_set_4, 'body_mass_g'),
x1 = colvalues(train_set_4, 'bill_length_mm'),
x2 = colvalues(train_set_4, 'bill_depth_mm'),
x3 = colvalues(train_set_4, 'flipper_length_mm')
),
chains = 3,
iter = 300,
warmup = 180,
thin = 1
)
starting worker pid=20482 on localhost:11335 at 22:09:00.432
starting worker pid=20519 on localhost:11335 at 22:09:00.548
starting worker pid=20556 on localhost:11335 at 22:09:00.663
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 1: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 1: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 1: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 1: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 1: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 1: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 1: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 1: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 1: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 1: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 1: Iteration: 300 / 300 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.178523 seconds (Warm-up)
Chain 1: 0.056149 seconds (Sampling)
Chain 1: 0.234672 seconds (Total)
Chain 1:
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 2: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 2: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 2: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 2: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 2: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 2: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 2: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 2: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 2: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 2: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 2: Iteration: 300 / 300 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.122112 seconds (Warm-up)
Chain 2: 0.030142 seconds (Sampling)
Chain 2: 0.152254 seconds (Total)
Chain 2:
SAMPLING FOR MODEL '075149bad897790d99f42d10fc339899' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1.3e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 300 [ 0%] (Warmup)
Chain 3: Iteration: 30 / 300 [ 10%] (Warmup)
Chain 3: Iteration: 60 / 300 [ 20%] (Warmup)
Chain 3: Iteration: 90 / 300 [ 30%] (Warmup)
Chain 3: Iteration: 120 / 300 [ 40%] (Warmup)
Chain 3: Iteration: 150 / 300 [ 50%] (Warmup)
Chain 3: Iteration: 180 / 300 [ 60%] (Warmup)
Chain 3: Iteration: 181 / 300 [ 60%] (Sampling)
Chain 3: Iteration: 210 / 300 [ 70%] (Sampling)
Chain 3: Iteration: 240 / 300 [ 80%] (Sampling)
Chain 3: Iteration: 270 / 300 [ 90%] (Sampling)
Chain 3: Iteration: 300 / 300 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.143324 seconds (Warm-up)
Chain 3: 0.062283 seconds (Sampling)
Chain 3: 0.205607 seconds (Total)
Chain 3:
params_4 <- c('beta0', 'beta1', 'beta2', 'beta3', 'sigma')
traceplot(bayesion_model_4, pars = params_4, inc_warmup = TRUE)
Coeficientes de la regresión múltiple:
lineal_model_4$coefficients
(Intercept) bill_length_mm bill_depth_mm flipper_length_mm
-7582.81982 11.48403 75.08405 50.08308
Coeficientes descubiertos por la regresión múltiple bayesiana:
for(param in params_4) print(get_posterior_mean(bayesion_model_4, par=param)[4])
[1] -6615.826
[1] 9.556223
[1] 56.53997
[1] 47.18177
[1] 260.8093
vars_4 <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm')
models_validation(lineal_model_4, bayesion_model_4, params_4, vars_4, test_set)
bayesion_predictor_4 <- BayesianRegressionPredictor.from(bayesion_model_4, params_4, vars_4)
plot_compare_fit(
lineal_model_4,
bayesion_predictor_4,
train_set,
label_1='Regresion Lineal',
label_2='Regresion Bayesiana'
)
Definimos una distribución uniforme para el beta asociado a la variable flipper_length_mm.
starting worker pid=21107 on localhost:11335 at 22:10:46.690
starting worker pid=21144 on localhost:11335 at 22:10:46.809
starting worker pid=21181 on localhost:11335 at 22:10:46.920
SAMPLING FOR MODEL '8d4f6f44cd52d0fbdc2d079b71ab6e36' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: exponential_lpdf: Random variable is -1.52376, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 1:
Chain 1: Gradient evaluation took 3.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 181 / 1000 [ 18%] (Sampling)
Chain 1: Iteration: 280 / 1000 [ 28%] (Sampling)
SAMPLING FOR MODEL '8d4f6f44cd52d0fbdc2d079b71ab6e36' NOW (CHAIN 2).
Chain 2: Rejecting initial value:
Chain 2: Error evaluating the log probability at the initial value.
Chain 2: Exception: exponential_lpdf: Random variable is -0.423318, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 2: Rejecting initial value:
Chain 2: Error evaluating the log probability at the initial value.
Chain 2: Exception: exponential_lpdf: Random variable is -0.0830189, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 2:
Chain 2: Gradient evaluation took 6.5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.65 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 380 / 1000 [ 38%] (Sampling)
Chain 1: Iteration: 480 / 1000 [ 48%] (Sampling)
SAMPLING FOR MODEL '8d4f6f44cd52d0fbdc2d079b71ab6e36' NOW (CHAIN 3).
Chain 3: Rejecting initial value:
Chain 3: Error evaluating the log probability at the initial value.
Chain 3: Exception: exponential_lpdf: Random variable is -0.0802033, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 3: Rejecting initial value:
Chain 3: Error evaluating the log probability at the initial value.
Chain 3: Exception: exponential_lpdf: Random variable is -0.264727, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 3: Rejecting initial value:
Chain 3: Error evaluating the log probability at the initial value.
Chain 3: Exception: exponential_lpdf: Random variable is -0.628905, but must be >= 0! (in 'model4ac92bc11fa9_8d4f6f44cd52d0fbdc2d079b71ab6e36' at line 20)
Chain 3:
Chain 3: Gradient evaluation took 4.4e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 580 / 1000 [ 58%] (Sampling)
Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 680 / 1000 [ 68%] (Sampling)
Chain 3: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 780 / 1000 [ 78%] (Sampling)
Chain 2: Iteration: 181 / 1000 [ 18%] (Sampling)
Chain 1: Iteration: 880 / 1000 [ 88%] (Sampling)
Chain 3: Iteration: 181 / 1000 [ 18%] (Sampling)
Chain 2: Iteration: 280 / 1000 [ 28%] (Sampling)
Chain 1: Iteration: 980 / 1000 [ 98%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.763975 seconds (Warm-up)
Chain 1: 2.55397 seconds (Sampling)
Chain 1: 3.31795 seconds (Total)
Chain 1:
Chain 3: Iteration: 280 / 1000 [ 28%] (Sampling)
Chain 2: Iteration: 380 / 1000 [ 38%] (Sampling)
Chain 3: Iteration: 380 / 1000 [ 38%] (Sampling)
Chain 2: Iteration: 480 / 1000 [ 48%] (Sampling)
Chain 2: Iteration: 580 / 1000 [ 58%] (Sampling)
Chain 3: Iteration: 480 / 1000 [ 48%] (Sampling)
Chain 2: Iteration: 680 / 1000 [ 68%] (Sampling)
Chain 3: Iteration: 580 / 1000 [ 58%] (Sampling)
Chain 2: Iteration: 780 / 1000 [ 78%] (Sampling)
Chain 3: Iteration: 680 / 1000 [ 68%] (Sampling)
Chain 2: Iteration: 880 / 1000 [ 88%] (Sampling)
Chain 3: Iteration: 780 / 1000 [ 78%] (Sampling)
Chain 2: Iteration: 980 / 1000 [ 98%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1.36463 seconds (Warm-up)
Chain 2: 4.11358 seconds (Sampling)
Chain 2: 5.47821 seconds (Total)
Chain 2:
Chain 3: Iteration: 880 / 1000 [ 88%] (Sampling)
Chain 3: Iteration: 980 / 1000 [ 98%] (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 1.30831 seconds (Warm-up)
Chain 3: 4.62123 seconds (Sampling)
Chain 3: 5.92954 seconds (Total)
Chain 3:
br_vs_br_coeficients(bayesion_model_1, bayesion_model_5, params_5)
models_validation(lineal_model_1, bayesion_model_1, params_1, vars_1, test_set)
vars_5 <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm')
models_validation(lineal_model_1, bayesion_model_5, params_5, vars_5, test_set)
bayesion_predictor_5 <- BayesianRegressionPredictor.from(bayesion_model_5, params_5, vars_5)
plot_compare_fit(
bayesion_predictor_1,
bayesion_predictor_5,
train_set,
label_1='Regresion Bayesiana con dist informativa',
label_2='Regresion Bayesiana con dist menos informativa'
)
COMPLETAR